Text Analysis of Biden and Trump Speeches During the 2020 Presidential Election

Introduction

The United States presidential election is one of the most followed political events in the world. As such, there are many who study the data involved in the hopes of both making predictions and informing the public on the current state of the election. In this blog post, we analyze data from a key component of the election process: speeches given by the candidates. In particular, we analyze text from speeches given by Joe Biden and Donald Trump during the lead up to the 2020 election. Our primary questions we hoped to answer were:

  1. What are the most common words and phrases used by Trump and Biden?
  2. What are the relationships between those words/phrases?
  3. How did the frequency of these words/phrases change over time?

Visualizations

In order to address our three posed questions, we created three types of visualizations, one for each question. To identify the most frequent words used in their speeches, we created wordclouds with fontsize corresponding to word frequency. To identify relationships between the words, we created network graphs with edge sizes corresponding to “closeness” of these words within the documents. (We will define “closeness” in the network section). Lastly, we created line graphs to identify changes in word frequencies over time.

Network Visualizations

To understand the relationships between speech words, we looked at two types of words: the most common words across all speeches and popular election topics such as climate change, health care, and COVID-19. For each of these sets, we needed to define some metric for “closeness”. To do this, we emulated the type of analysis done in a Game of Thrones analysis.

Text Mining

Talk about Python here (and include non-runnable chunk)

Network Analysis for Most Commonly Used Words

Speech Analysis Over Time

Word Frequency Wordclouds

Limitations, Pitfalls, and Future Research